Skip to content

Add common cache and per-build cache#62

Merged
devrandom merged 2 commits intodevrandom:masterfrom
theuni:cache
Aug 7, 2014
Merged

Add common cache and per-build cache#62
devrandom merged 2 commits intodevrandom:masterfrom
theuni:cache

Conversation

@theuni
Copy link
Copy Markdown
Contributor

@theuni theuni commented Jul 26, 2014

This is merely a POC to start a discussion. I'm sure there's a nicer way of achieving the same thing.

Allow each builder to cache some files for re-use in the next build. This allows for poor-man's dependency chaining.

Additionally, add a common cache pool for all builds. This can be used for saving (for example) downloaded files to be shared between builds.

Needed for the Bitcoin build process overhaul. I'll link the PR for discussion once it's posted.

@devrandom
Copy link
Copy Markdown
Owner

Interesting!

What is the advantage of the common cache? It seems like it would be better for each cache item to be attributable to a specific package.

@theuni
Copy link
Copy Markdown
Contributor Author

theuni commented Jul 30, 2014

I'm sure there are other cases, but in this particular use, common cache is helpful for descriptors that fetch their own sources.

It's used in the Bitcoin overhaul because there's a buildsystem shared between gitian and the pull-tester. Since the buildsystem fetches and verifies its own sources anyway, there's no need to include them as gitian inputs. And since several descriptors share sources, it'd be senseless to fetch them for each one.

@devrandom
Copy link
Copy Markdown
Owner

A couple of comments:

  • Normally, sources should go in the inputs directory
  • I originally envisioned the actual build process as having no network access. I'm surprised that there's downloading of sources as part of any descriptors.

I'm puzzled by "Since the buildsystem fetches and verifies its own sources anyway, there's no need to include them as gitian inputs". What is the downside to having the pull-tester place the sources in the inputs directory?

@theuni
Copy link
Copy Markdown
Contributor Author

theuni commented Jul 31, 2014

1: I agree, but this work is a bit outside the box. I'll try to show below why I went this route.
2: Again, agreed. But the sources only download on the first run since they're cached after that.

Neither of those are desirable, but they were sacrifices I made in order to unify things.

Would you mind giving the description of bitcoin/bitcoin#4592 a quick read? I'd like to give a real example of how all of this ties together. There's a lot going on, so I'll try to summarize as briefly as possible (hint: it won't actually be brief ;)

In the past, for Bitcoin, there's been a disconnect between what devs run, what the pull-tester tests, and what Gitian builds. I've attempted to unify those things so that the pull-tester is able to bulid/test exactly what Gitian will produce, minus the deterministic guarantees.

To do this, i created a build-system for them to share. This build-system builds all dependencies as-needed and caches individual results. So if libfoo's build-recipe (or the build-system itself) hasn't changed since the last run, it won't be rebuilt. Instead, it will just be unpacked. This system is deterministic in its own right... the details are a bit complex, but you can assume that to be true.

With that done, the pull-tester and Gitian can store the build-results and reuse them, rather than rebuilding each dependency every time. See here for how this is actually happening:

pull-tester: https://github.com/coryfields/bitcoin/blob/master/.travis.yml
gitian: https://github.com/coryfields/bitcoin/blob/master/contrib/gitian-descriptors/gitian-linux.yml

Note how both of them call "make -C depends", then use those results to build bitcoin.

The result is that our gitian descriptors can stay static... we don't have to sync them up with anything, and yet we know that they'll build the same thing that the pull-tester did for any particular commit. If a dependency needs to be changed, it's changed in the dependency builder.

So, all that said, here's an example of it in action:
https://github.com/coryfields/bitcoin/pull/3/files
If you check out the build-log, you'll see what's happening: https://travis-ci.org/coryfields/bitcoin/builds/31358210

Notice that the new version of qrencode was built/fetched/installed. Since nothing else depends on qrencode, nothing else had to be rebuilt. If (for example) qt had depended on qrencode, it would've been rebuilt as well against the new qrencode.

If I use gitian to build that commit, the exact same thing will happen. Any cached results from previous builds will be used so that only qrencode will have to rebuild.

The end-result is a guarantee that gitian will build exactly what the c-i is building, with no (or very little, i hope) chance of deviation. So this commit is all it takes for us to bump that dependency, have it built/verified, and have it present in a release. In less than 10 minutes.

So... I would very much like to maintain that behavior. Imo, it's a huge feature for us. However, the 2-cache system is admittedly very kludgy. Do you have any suggestions on how it could be done more elegantly?

@theuni
Copy link
Copy Markdown
Contributor Author

theuni commented Jul 31, 2014

To clarify, in case I didn't above, the pull-tester and Gitian know nothing of each-other. The pull-tester runs automatically via some cloud magic, and devs use Gitian manually. I realize that it may read as though the pull-tester is using (or aware of) Gitian in some way, but that's not the case. The common factor is the dependency builder.

@theuni
Copy link
Copy Markdown
Contributor Author

theuni commented Aug 6, 2014

@devrandom Any thoughts on the above? The bitcoin dependency builder is nearly merge-ready, and I'd like to have a plan for dealing with Gitian.

@devrandom
Copy link
Copy Markdown
Owner

I think a build artifact cache is likely to be a good direction. However, I'd like to make sure it's clear how to use it. Would it be possible to articulate exactly how each type of cache is meant to be used?

@theuni
Copy link
Copy Markdown
Contributor Author

theuni commented Aug 6, 2014

Sure. I'll describe exactly how I've used it for bitcoin, though I'm sure there are other use-cases.

Before the cache:

Descriptor 1: Windows

  • Input: libfoo-source.tar.gz
  • Output: libfoo

Descriptor 2: Windows

  • Input: git repo bar
  • Input: Output of Descriptor 1
  • Output: program bar.

Descriptor 3: Linux

  • Input: libfoo-source.tar.gz
  • Output: libfoo

Descriptor 4: Linux

  • Input: git repo bar
  • Input: Output of Descriptor 3
  • Output: program bar.

Process: User builds descriptors 1 and 2, saves the outputs, copies them to inputs, then builds descriptors 3 and 4.

With the cache:

Descriptor 1: Windows

  • Cache: Check global cache for libfoo-source.tar.gz. If it doesn’t exist, fetch it.
  • Cache: Check per-build cache for libfoo. If it doesn’t exist, build it.
  • Output: program bar. Store libfoo to per-build cache and libfoo-source.tar.gz to global cache.
  • Input: None

Descriptor 2: Linux

  • Cache: Check global cache for libfoo-source.tar.gz. If it doesn’t exist, fetch it.
  • Cache: Check per-build cache for libfoo. If it doesn’t exist, build it.
  • Output: program bar. Store libfoo to per-build cache and libfoo-source.tar.gz to global cache.
  • Input: None

Process: User builds descriptors 1 and 2. If cached versions of libfoo are found from previous gitian runs, they will be used instead of rebuilding.

Note that the logic to determine if a cached version can be reused is not handled here, that's up to the user to work out.

libfoo-source.tar.gz is only fetched once because when the 2nd descriptor is run, the 1st descriptor will have already put it in the global cache.

@devrandom
Copy link
Copy Markdown
Owner

Okay, putting a concise version of this in the doc directory would be helpful. I think we can recommend that sources go in the common cache, and binary build artifacts go in the build-specific cache.

I just realized that the gitian build process can be run without a network connection if the cached sources are present.

So I can go ahead and accept the pull request if you add the docs unless you have further thoughts.

@theuni
Copy link
Copy Markdown
Contributor Author

theuni commented Aug 6, 2014

Yea, the cache can be pre-seeded to mimic the use of inputs. I was tempted to use the inputs dir itself for the global cache, but I think that might lead to some nasty accidents.

I'll do up some docs. Thanks for hearing me out!

@theuni
Copy link
Copy Markdown
Contributor Author

theuni commented Aug 7, 2014

@devrandom added a quick readme.

theuni added 2 commits August 7, 2014 13:01
Allow each builder to cache some files for re-use in the next build. This
allows for poor-man's dependency chaining.

Additionally, add a common cache pool for all builds. This can be used for
saving (for example) downloaded files to be shared between builds.
devrandom pushed a commit that referenced this pull request Aug 7, 2014
Add common cache and per-build cache
@devrandom devrandom merged commit 9092f98 into devrandom:master Aug 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants